AITopics | step size 0

Collaborating Authors

step size 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trajectory-Level Data Augmentation for Offline Reinforcement Learning

Schmähling, Tobias, Burkhardt, Matthias, Windisch, Tobias

arXiv.org Machine LearningMay-14-2026

We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.

machine learning, reinforcement learning, step size 0, (12 more...)

arXiv.org Machine Learning

2605.13401

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

78421a2e0e1168e5cd1b7a8d23773ce6-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 10:52:37 GMT

ar-adals, best calibration performance, ensemble model, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Implementation Details

Neural Information Processing SystemsAug-15-2025, 07:19:57 GMT

For all the experimental results on ResNet-29 v2 (He et al., 2016b), we use a batch size The network is trained with Adam optimizer (Kingma et al., 2015) for 200 epochs. We randomly split the training dataset into training data of 45000 images and 5000 images as the validation set. We train a Wide ResNet-28-10 v2 (Zagoruyko & Komodakis, 2016) to obtain the state-of-the-art accuracy for CIFAR-10 (e.g., Table 2 in the main text). For mixup (Zhang et al., 2018; Thulasidasan et al., 2019), the mixing parameter of two images is For CCA T (Stutz et al., 2020), we observe that training models with adversarial examples bounded We train a Wide ResNet-28-10 v2 (Zagoruyko & Komodakis, 2016) to obtain the state-of-the-art accuracy for CIFAR-100. All the experiments on ImageNet were obtained via training a ResNet-101 v1 (He et al., 2016a) following the training script at The input image is normalized (divided by 255) to be within [0,1].

ar-adals, best calibration performance, ensemble model, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

70d31b87bd021441e5e6bf23eb84a306-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 03:14:11 GMT

experiment, hyperparameter, proposition 4, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

New logarithmic step size for stochastic gradient descent

Shamaee, M. Soheil, Hafshejani, S. Fathi, Saeidian, Z.

arXiv.org Artificial IntelligenceApr-1-2024

Stochastic gradient descent (SGD), which dates back to the work by Robbins and Monro Robbins and Monro [1951a] is widely observed in training modern Deep Neural Networks (DNNs), which are widely used to achieve state-of-the-art results in multiple problem domains like image classification problems Krizhevsky et al. [2017, 2009], object detection Redmon and Farhadi [2017], and classification automatic machine translation Zhang et al. [2015]. The value of the step size (or learning rate) is crucial for the convergence rate of SGD. Selecting an appropriate step size value in each iteration ensures that SGD iterations converge to an optimal solution. If the step size value is too large, it may prevent SGD iterations from reaching the optimal point. Conversely, excessively small step size values can lead to slow convergence or mistakenly identify a local minimum as the optimal solution Mishra and Sarawadekar [2019]. To address these challenges, various schemes have been proposed. One popular approach is the Armijo line search method, initially introduced for SGD by Vaswani et al. Vaswani et al. [2019], which provides theoretical results for strong-convex, convex, and non-convex objective functions.

dataset, logarithmic step size, step size, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11704-023-3245-z

2404.01257

Country: Asia > Middle East > Iran > Fars Province > Shiraz (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

Szlendak, Rafał, Tyurin, Alexander, Richtárik, Peter

arXiv.org Machine LearningOct-7-2021

We study the MARINA method of Gorbunov et al (2021) -- the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on {\em independent} stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round. In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp. $O(1 + d/\sqrt{n})$) improvement in the theoretical communication complexity of MARINA in the low Hessian variance regime when $d\geq n$ (resp. $d \leq n$), where $n$ is the number of workers and $d$ is the number of parameters describing the model we are learning. We corroborate our theoretical results with carefully engineered synthetic experiments with minimizing the average of nonconvex quadratics, and on autoencoder training with the MNIST dataset.

permk, permutation compressor, randk, (12 more...)

arXiv.org Machine Learning

2110.033

Country:

Asia > Middle East > Saudi Arabia (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Scalable MCMC for Mixed Membership Stochastic Blockmodels

Li, Wenzhe, Ahn, Sungjin, Welling, Max

arXiv.org Machine LearningOct-21-2015

We propose a stochastic gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in mixed-membership stochastic blockmodels (MMSB). Our algorithm is based on the stochastic gradient Riemannian Langevin sampler and achieves both faster speed and higher accuracy at every iteration than the current state-of-the-art algorithm based on stochastic variational inference. In addition we develop an approximation that can handle models that entertain a very large number of communities. The experimental results show that SG-MCMC strictly dominates competing algorithms in all cases.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1510.04815

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback